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We consider the occupancy problem where baUs are thrown inde- 
pendently at infinitely many boxes with fixed positive frequencies. It 
is well known that the random number of boxes occupied by the first 
n balls is asymptotically normal if its variance Vn tends to infinity. In 
this work, we mainly focus on the opposite case where Vn is bounded, 
and derive a simple necessary and sufficient condition for convergence 
of Vn to a finite limit, thus settling a long-standing question raised 
by Karlin in the seminal paper of 1967. One striking consequence of 
our result is that the possible limit may only be a positive integer 
number. Some new conditions for other types of behavior of the vari- 
ance, like boundedness or convergence to infinity, are also obtained. 
The proofs are based on the poissonization techniques. 



1. Introduction. The classical occupancy problem is one of the cor- 
nerstones of discrete probability, dating back to its early ages (and hence 
encountered over and over again by the generations of students studying 
elementary probability through the evergreen hits like the birthday prob- 
lem, the coupon collector's problem, etc. 0, ^^). It still attracts lots of 
research interest, especially in recent years, mainly due to its numerous ap- 
plications spreading across the board, from sampling statistics and quality 
control to quantum physics, bioinformatics and computer science. For an 
introduction to the field and a survey of the many models and results, see 
Tol . 2ll . 3 22) 2^ and further references to original work therein. 



In this paper, we are concerned with a version of the occupancy problem 
in an infinite urn scheme (first considered by Bahadur |3| an d later on 
studied by Darling j^^l and most systematically by Karlin H^), in which 
the balls labeled 1,2, .. . are thrown independently at an infinite array of 
boxes (urns) j = 1,2,... , with fixed probability (frequency) pj of hitting 
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box j. The frequencies pj are assumed to be strictly positive and satisfying 

oo 

(1-1) H:=^p, = l. 

i=i 

Without loss of generality, we further assume that the sequence (pj) is non- 
increasing, pi > P2 > • • • ■ 

Let Kn be the number of boxes discovered by the first n balls (i.e., oc- 
cupied by at least one of the first n balls). Many other interpretations of 
this functional appear in the literature: for instance, when {pj) is consid- 
ered as a probability distribution on positive integers, Kn is the number 
of distinct values occurring among n random values sampled independently 
from (pj). Since there are infinitely many boxes, Kn increases unbound- 
edly (with probability one) as more balls are thrown, which also implies 
(e.g., by Fatou's lemma) that the same is true for the expected number 
of occupied boxes, E,{Kn)- Moreover, as shown by Karlin |2fj . Theorem 8], 
Ihnn^oo Kn/^{Kn) = 1 with probability one (an earlier result about con- 
vergence in probability was obtained by Bahadur '^). 

The more delicate asymptotic properties of the random variable Kn are 
largely determined by its variance Vn ■= Var (Kn)- It is known 
that the distribution of Kn converges to a normal distribution provided that 
— > oo as n ^ oo. The latter occurs, for instance, when the frequencies 
have a power-like decay, pj ~ cj~" (j oo) with a > 1 or, more generally, 
satisfy a condition of regular variation ||25|]. (Here and throughout, c stands 
for a generic positive constant, specific value of which is not important.) 

1.1. Main result: the case of converging variance. In this paper, we es- 
sentially focus on the opposite situation, that is, when Vn is uniformly 
bounded (and hence the distribution of Kn does not converge to normal). In 
particular, we prove the following surprising characterization of frequencies 
(pj) for which the variance Vn tends to a finite limit as n ^ oo. 

Theorem 1.1. A finite limit v := lim„^oo exists if and only if for 
some integer k > 1 the frequencies satisfy the "lagged ratio" condition 

(1.2) lim^ = i, 

j^oo Pj 2 

and in this case the limiting value v coincides with the lag k. 

The striking consequence of this result is that whenever the finite limit of 
the sequence (Vn) exists, it must be a positive integer number, v G N. 
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The issue of converging variance was first queried in the seminal paper 
by Karhn [2^, where in particular he appreciated as "formidable if not 
impossible" the task to determine the behavior of the variance Vn without 
some regularity assumptions. In particular, adopting the condition of regular 
variation of the frequency tail, he came up with a sufficient condition for 
the existence of a finite limit of K 0, Theorem 2]. In fact, as we shall 
see below (in Section ISJ, convergence to a finite limit, combined with the 
special dyadic structure of the counting measure controlling the frequency 
input, is a regularity condition in itself, being strong enough to ensure the 
result of Theorem ll.lL (To be more precise, the "dyadic" feature mentioned 
above, pertains primarily to the poissonized version of the problem, i.e., with 
randomized number of balls, see Section |2l below) . 

The prototypical (apparently folklore) instance of frequencies (pj) with 
converging variance Vn is the geometric sequence of ratio 1/2 (i.e., vj = 2~^), 
where one can show with some effort that — > 1 as n — > 00 (see |l3l . Eol . E5| ). 
Note that our condition (|1.2j) is obviously satisfied here with k = 1, hence 
the result. The mechanism leading to such a simple answer is due to a 
resonance of the ratio q = 1/2 with the intrinsic dyadic structure of the 
variance, resulting in massive cancelation of oscillating terms (again, in the 
poissonized version, see Example 12.21 below). Recently, such cancelations 
have been explained directly for the original model (i.e., for Vn) using so- 
phisticated analytic methods [3, |^ . 

It seems to be less well known that for generic geometric frequencies 
Pj = cq~^ , the (finite) hmit of Vn exists if g = 2~^/'^ {k £ N), with the 
limiting value v = k (see [i^, §4, page 15]). Again, using Theorem 11.11 
one gets this answer immediately, together with the "only if" statement; 
moreover, the same conclusion can be readily extended to sequences (pj) 
from the parametric class RTq (see 0, 0, 0|), defined by the property 

(1.3) lim ^ = q, 

i-*oo Pj 

thus asymptotically mimicking the geometric decay. (Some concrete exam- 
ples of distributions in the RTg class, complementing the geometric instance, 
will be given below in Section ESI) Indeed, in the RTg case equation H1.2() 
amounts to q'^ = 1/2, whence q = 2"^/^^. Of course, condition (|1.3|) is too 
restrictive for the criterion (|1.2|) . as can be seen for instance by merging k 
geometric sequences of the same ratio q = 1/2 (and normalizing the resulting 
sequence so as to satisfy (jl.lf) ). 

The following "decomposition" interpretation of Theorem ll.ll clarifies the 
compound structure of frequency sequences (pj) that exhibit convergence 
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of the variance. Observe that by condition (|1.2|1 . the sequence (pj) sphts 

(i) 

in a disjoint fashion into k non-increasing subsequences pj := 
{i = 1, . . . , k), each belonging to the RT1/2 class: 

(1.4) (p,) = U : lim % = ^ (i = 1, • • • , k). 

i=i J'j 

Moreover, by the "if" part of Theorem ll.il each of the k constituent subse- 
quences brings a unit contribution to the overall limiting variance v = k. 

Such a decomposition may be interpreted as splitting the initial array 
of boxes 1,2,... into k infinite sub-arrays {i + k{j — 1), j = 1,2,...} 
(i = 1,... ,A;), and allocating the balls to boxes in a two-stage procedure 
as follows: for each ball, a destination array is chosen independently with 
probabilities Hp*-*^!!, and the ball is then thrown with the corresponding (re- 
scaled) frequencies Pj*VllP*'*^ II (i = 1; 2, . . . ). The additivity of the variance 
in this procedure, as predicted by Theorem 11.11 may be somewhat surpris- 
ing, given the apparent dependence of the partial occupancy numbers Kn 
{i = 1, . . . , /c). However, additivity becomes quite transparent in the pois- 
sonized setting, where the dependence between boxes is removed (see a re- 
mark in Section [2. 2|) . 

1.2. Geometric frequencies. Historically, there has been some confusion 
about the converging variance in the geometric model. Controversy started 
in [25I . Example 6], where Karlin asserted that his sufficient condition for 
convergence |25l . Theorem 2] was satisfied for every geometric sequence 
Pj = cq' {Q < q < 1), with the limiting value given hy v = log^/^ 2. As we 
have seen, this is false unless q belongs to the countable set {2~^/*^, A; G N}. A 
more careful inspection reveals that Karlin's condition, if applied accurately, 
does yield the correct answer in the geometric case, properly discriminating 
between convergence vs. divergence! Moreover, we have found out, quite un- 
expectedly, that Karlin's condition (decorated in [2^ with some superfluous 
assumptions and originally conceived as just a sufficient condition) proves 
to be necessary and sufficient, being equivalent to our own criterion proved 
in Lemma l5. II We will discuss this link below, in Section f5. 41 

That there was something wrong with Example 6 in j2^] was subsequently 
pointed out by Dutko 13, page 1258], who noticed that Vn is bounded below 
by a positive constant, uniformly in n and g, hence the limit v = log]^/^2 
cannot be valid at least for small values of q (when log^/g 2 gets arbitrarily 
close to zero). However, Dutko page 1258] apparently claimed that the 
limit of the variance fails to exist for each q 7^ 1/2, thus missing the other 
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values, q = 2^^/^^, k > 1. Unfortunately, he gave no details to support such 
a conclusion, referring to his unpublished thesis 12], which is not easily 
available. 

More recent studies 0, 0, [H, 0] have shed much light on the geomet- 
ric model. Hitczenko and Louchard (motivated by random composi- 
tions of natural numbers) were apparently first to prove analytically that 
Vn = I + o(l) in the geometric case with q = 1/2, contrary to "popu- 
lar belief" I21j] that persistent oscillations are ubiquitous in discrete random 
structures involving geometric distribution (see, e.g., ^20, .32, JJ^]). Prodinger 
[31I 1 gave an alternative proof of this asymptotics (along with a similar result 
for a particular model of data search trees called PATRICIA tries) , proceed- 
ing from the general "oscillatory" framework. Recently, Archibald et al. |3, 
Theorem 2] derived a very precise asymptotic expansion 

(1.5) K = logi/g2 + (5y(logi/^n) +0(1) (n^oo), 

where 5v{x) := 5e{x + logi/g 2) — 5e{x) with 5e{-) periodic of period 1 and 
zero mean (the latter function emerges in a similar expansion for the 
expected value of Kn)- If g = 1/2 then logj^/^ 2 = 1, and from the expansion 
(|1.5|) it is seen that the oscillating term vanishes due to 1-periodicity of (5£;(-), 
since 5y{x) = 5e{x A- 1) — 5e{x) = (see ,2, Appendix A, page 1079]. In 
fact, the same argument is true for any q = 2~^^^_Jk G N), when log^/^ 2 = k 
and hence 6v{x) = 6e{x + k) — 5e{x) = (see |23, §4, page 15]). 

1.3. Bounded variance and convergence to infinity. One can also wonder 
about conditions for other possible types of behavior of the variance Vn- We 
shall prove the following criterion of uniform boundedness, again set in terms 
of the lagged ratio pj+k/Pj compared to the upper threshold 1/2 [cf. (|1.2|) ]. 

Theorem 1.2. The sequence (Vn) is bounded if and only if there exists 
a positive integer k such that the frequencies (pj) satisfy the condition 

(1.6) limsup ^-'^^ < . 

J— >oo Pj ^ 

Moreover., if k is the least integer with the property (|l.(ij) . then (Vn) satisfies 
a sharp asymptotic bound lim sup„_,oo Vn < k. 

This situation is exemplified by the generic geometric frequencies, with 
arbitrary ratio < q < 1. Another example is the Poisson frequencies pj = 
cX' /j\ (A > 0), where the variance Vn is bounded but does not converge: 
indeed, here pjj^k/Pj ~ {^1 j)^ ^ as j — > 00, hence (|1.6() is fulfilled whereas 
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(|1.2j) fails. A larger class is that of quasi-binomial distributions i2fi], given by 
Pj ~ (c/j O ntol-^ + with parameters A > 0, < (7 < 1. (To explain the 
name, note that = (1 — q)~^^'' — 1 for q > 0, while for q = one has, in a 
continuous fashion, = e'^ — 1, thus recovering the Poisson normalization 
constant.) Somewhat similar but different parametric family is given by the 
negative binomial distribution = {cq^ / jl)Yli~Q{X + i) = c['^^j^^)q^, with 

A > 0, < g < 1 [here c"^ = (1 - q)'^ - 1]. 

Note that all these examples belong to classes RT^ with < q < 1. It 
is possible to construct more general examples using the "decomposition" 
reformulation of Theorem ll.2l in the spirit of ()1.4() . in that the variance Vn is 
uniformly bounded if and only if the sequence (pj) may be split in a disjoint 
fashion into a finite number of subsequences, each of which satisfies condition 
(|1.6|) with k = 1 (e.g., each from RTg^ with < < 1/2, i = 1, . . . ,k). 

We shall also address the classical question of convergence to infinity 
and produce new conditions ensuring that Vn ^ 00. Note, however, that 
in contrast to the convergent or bounded cases, no necessary and sufficient 
criteria are available without extra regularity assumptions. To illustrate our 
results in this direction, let us formulate here two sufficient conditions, the 
first of which is set in terms of the lagged ratios Pj+k/Pj against the lower 
threshold 1/2 [cf. (|1.6j) ]. while the second one is based on the "tail ratio" 

(1.7) Pj-=^Y.Pi- 

PJ i>j 

Theorem 1.3. Suppose that for each integer k>l, 

(1.8) liminf^>-. 

j^oo Pj 2 

Then it follows that 

(1.9) lim Pj =00, 

which in turn implies that Vn ^ 00 as n — > 00. 

Examples to Theorem 11.31 are immediately supplied by the class RTi, 
where condition ()1.8|1 is obviously satisfied for any A; > 1. More complex 
examples (not in RTi) will be constructed in Sections 14.11 and 14.31 

Remark. The tail ratio H1.7|) can be expressed as pj = (1 — hj)/hj, 
where hj = pjj J2'i>jPi is the discrete-time hazard rate, a key characteristic 
in reliability theory and survival analysis (see, e.g., |^]). The latter quantity 
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also appears in the extreme value theory in connection with records from 
discrete distributions, where it is interpreted as the probability that j is a 
record value (see, e.g., In the occupancy context, condition (|1.9j) 

is related to the "probability of a tie for first place" F{Xn,Mn = 1}, where 
Mn := max{j : Xnj 7^ 0} is the largest index among the occupied boxes 
after n throws. More specifically, it has been proved [3, ^3] that condition 
(|1.9|1 is satisfied if and only if 

(1.10) P{X„,M„ = 1}^1 (n^cx)), 

and moreover, if 1)1.9(1 fails then ¥ {Xn^M„ = 1} does not converge at all. 
This, combined with Theorem 14.31 shows that (|1.1U() implies both Vn ^ 00 
and <I>„^i 00, which is a surprising connection between the behavior in 
the extreme-value range and the global characteristics of the sample. These 
facts equally apply to the poissonized model. 

1.4. Outline. The rest of the paper is organized as follows. Section |21 
contains general formulas and introduces the poissonization technique. In 
Section |31 we connect the variance Vn with the mean number of singletons 
(i.e., the boxes occupied by exactly one of the first n balls) and derive useful 
upper bounds. We also obtain here a basic integral representation of the 
poissonized variance V{t) via the Laplace transform of the function Aiy{x), 
counting the frequencies pj in the interval ]x/2,x], and relate the threshold 
values of Ai/(-) with the lagged ratios Pj+k/Pj- This analysis culminates 
in the proof of Theorem 11.21 In Section HI various sufficient conditions for 
^ 00 are derived, which covers the content of Theorem 11.31 We also 
show that these conditions are not necessary, by constructing examples of 
weird oscillatory behavior. In Sectional we derive a simple integral condition 
in terms of the function Az^(-), necessary and sufficient in order that V{t) 
converge to a finite limit. This criterion is then used to prove Theorem ll.il 
In conclusion, we rehabilitate Karlin's sufficient condition of convergence, 
by showing that it is in fact necessary and sufficient. 

2. Poissonization and moment formulas. Let Xnj be the occu- 
pancy number of box j after n throws, that is, the number of balls out of 
the first n that land in box j. Note that 

00 

(2.1) Kn = J2l{Xn,j>0}, 

where l(^) is the indicator of event A (i.e., with values 1 when A is true 
and otherwise). Because "^j^i^nj = n, it is clear that the terms in the 
sum (|2.1|) are not independent. 
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2.1. Poissonization. A common recipe to circumvent the dependence 
(see 0, Ell for a general introduction and [iO, 0, IH, 27 1 for details in 



the occupancy problem context) is to consider a closely related model in 
which the balls are thrown at the jump times of a unit rate Poisson process 
{N{t), t > 0): by this randomization the balls appear in boxes according to 
independent Poisson processes Xj{t), with rate pj for box j. Further advan- 
tage of the poissonized model is that the normalization ()1.1() can be replaced 
by a weaker summability condition ||p|| = J2'j^i Pj < oo, thus allowing one to 
avoid computing normalization constants in expressions for pj. Clearly, the 
normalization can always be maintained by rescaling the frequencies 

Pj ^ \\p\\~^Pjj to the effect of a linear time change, 1 1— > \\p\\t. 

In what follows, we adopt the convention that quantities derived from the 
poissonized version of the occupancy problem are written as functions of the 
continuous time parameter t, while for the original model we preserve the 
notation with lower index n. In particular, we write Xj{t) (cf. above) for the 
number of balls that land in box j by time t and 

oo 

(2.2) K{t):= K^it) = E (*) > 0} 

for the number of boxes discovered by the Poisson process N{t). Likewise, 
denoting by i^n,r the number of boxes, each of which is hit by exactly r of 
the first n balls, we write 

oo 

Kr{t) ■.= Kr,i^t),r = Y.^{Xj{t)=r] 

i=i 

for the corresponding poissonized quantity (which is the number of boxes 
containing exactly r balls each by time t). Clearly, 

Kn = J2Kn,r, K (t) = Kr{t) , 

(2.3) 

n = Y,rKn,r, N{t)=Y,rKr{t). 

r r 

For the mean values of the number of occupied boxes we have the formulas 

oo 

(2.4) $„:=E(i^„)=E(l-(l-P:,-n, 

oo 

(2.5) ci,(t):=E(i^(t)) = E(l-e-*^0, 
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related by the poissonization identity 



oo 



n=0 

where <I>o = 0. Encoding the collection of frequencies into an infinite counting 
measure on M_(_ = ]0, oo[ 

CO 

(2.6) u{dx):=Y,dj,^{dx) 

i=i 

(where 6x is the Dirac mass at x, i.e., Sx{A) = l{x G A} for A C M_(_), we 
can represent the mean values ()2.4() . 1)2. 5() in an integral form as 

(2.7) /'(l-(l-x)")Kdx), 

Jo 

POD 

(2.8) $(t)= / (l-e-*^)i/(dx). 

Jo 

Remark. When the frequencies are normalized by then all pj < 1 
and the integral in (|2.8j) could be written in the limits from to 1, similarly 
to 1)2. 7() . In the poissonized model, specific normalization is not important, 
so we prefer to use a more flexible notation as in (|2.8|) . The same conven- 
tion applies to similar representations below (see, e.g., formulas 1)2. lUj) and 
(t2T3ll V 

Furthermore, set 

(2.9) ^n,r ■.= EiKn,r) = (^""^y^ X^il - xr~^ l^idx), 

fr i<oo 

(2.10) ^Jt) ■.= 'E[Kr(t)] = - 2;^e-*^i/(dx), 

rl Jo 

the latter being related to the derivatives of $(t) via 
Note that equations 1)2. 3|) imply 

^n = Y.^n,r, $(t) = ^$,(t), 

(2.11) 



n 



An analyst will recognize in 1)2. 8|1 a Bernstein function (see [3]) with the 
following general properties (see also 0|). 
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Lemma 2.1. If an infinite measure i' on satisfies J^{l—e ^)i^(dx)< 
cxD, then ()2.8() defines a function <!>(•) which 

(i) is analytic in the right half-plane, 

(ii) has alternating derivatives {—lY~^^^^'^\t) > (t > 0), 

(iii) satisfies <I>(t) | cxd but <I>(t) = o{t) as t — > oo. 

Conversely, if a function <I>(t) on [0,oo[ has the properties (ii) and (iii) 
along with $(0) = 0, t/ien there exists a unique infinite measure v on M-|_ 
such that representation (|2.8|1 holds. 

2.2. The variance of the number of occupied boxes. By the independence 
of summands in 1)2. 2|1 . the variance of K{t) is given by 

oo 

(2.12) V{t) := Var (K(t)) = ^(e"*?'^ - e'^'P^ ), 
which is the same as 

POO 

(2.13) V{t)= / (e-*^ -e-2*^>(dx) = $(2t) -$(t). 

JO 

Example 2.2. For geometric frequencies of ratio q = 1/2, that is, pj = 
(j = 1, 2, . . . ), the sum (|2.12|) is evaluated exphcitly thanks to telescop- 
ing of partial sums (see page 1258]): 

M 

V{t) = Mm y fe-*2-^ - e-*2-+M = Hm (e'^^""^ - e"*) = 1 - e"*. 

In particular, it follows that V{t) 1 as t ^ oo. More generally, a similar 
simplification occurs in the geometric case with the ratio q = 2~^/^ (k > 1), 
where it is convenient to split the sum in 1)2.12(1 into k sub-sums (over j = i + 
k{i — l), where i = 1, . . . ,k, £ = 1,2, . . .), each involving a (non-normalized) 
geometric sequence with ratio 1/2. Applying the previous result (with q = 
1 /2) and adding up the k unit contributions emerging in the limit from the 
k constituent subsequences, we obtain the convergence V{t) k as t ^ oo. 
For other values of q the formula for the variance does not simplify. 

Remark. The poissonized variance is additive: if (p^^^) and (p^^^) are 
two summable sequences of frequencies, and if {pj) is obtained by merg- 
ing them into a single sequence, then the corresponding variances satisfy 
V^^\t) + V^'^\t) = V{t). This explains the structural decomposition of the 
variance mentioned in the Introduction and illustrated in Example 12.21 
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The fixed-n counterpart of (|2.12|) is 

oo 

(2.14) Vn = $2n - <I>n + ^((1 - K - P,T " (1 " " P3T) , 

where the cross-terms arise due to dependence in (|2.ip . 

2.3. Depoissonization. According to li^, Proposition 4.3(ii)], the vari- 
ances V{n) and Vn are always of the same order, 

(2.15) < Uminf < hmsup-^^ < 00. 

n^oo Vn n^oo Vn 

In the next lemma, we establish estimates for the deviation of the pois- 
sonized quantities from their fixed-n counterpart in terms of higher-order 
moments, which will be instrumental for depoissonization in the case of 
bounded variance (see Section ^ . 

Lemma 2.3. // the normalization (|1.1|) holds then 

(2.16) $(n) - = 0(n"^)$2(n), 

(2.17) V{n) -Vn = O(n-i) ($i(n)2 + $2(n)) , 
and for each r = 1, 2, . . . 

(2.18) $,(n) - ^n,r = 0{n-^) ($,(n) + $,+i(n) + $,+2(n)) , 
Proof. We shall need the elementary inequalities 

(2.19) < e"*^^ - (1 - x)" < nx^e""^ (0 < x < 1) . 

The first inequality is obvious, while the second one follows from the estimate 

(1 - x)" > (1 - x2)"e-"^ > (1 - nx2)e-"^ 

Now, using representations (|2.7|) . (|2.8|) (rewriting the integral (|2.8|) in the 
limits from to 1, due to ()1.1|) ) and inserting the bounds ()2.19|) . we obtain 

< $n - Hn) = /\e-"^ - (1 - x)") u{dx) < - <!>2{n), 
Jo n 

which proves Next, from and (fTTU)) we get 

(2.20) $,(n)-«>„, = 0(n-^)$,(n) + ^ x^(e-"^' - (1 - x)"-^ ^^(dx) . 

r! Jo 
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By the inequalities (|2.19|) . for each x £ [0,1] 

(2.21) e-"^ - (1 - x)"-^ > e""^ - e-^"^"")^ > -(e^ - l)xe-'"^, 

(2.22) e""^ - (1 - x)"-'" < e-"^ - (1 - x)" < nx^e""^. 



Substituting the estimates (|2.21j) and (|2.22)) into (|2.2flj) and recalhng the 
notation (tOTl yields fTH^ . 

Finally, as shown in (2Qi . Theorem 2.3], the cross-terms in ()2.14|) can be 
evaluated as 

(1 - p,ni - PjT - (1 - p. - PjT = nmpj (1 - K)"-i(l - PjT^' 

+o{n^ph]il-p^r-\l-p,r-'). 



Inserting this estimate into (|2.14|) and summing over all we obtain 

(2.23) K = $2n - ^-n + 0{n~^) + 0(n-2) $2 ^ . 

From (|2.1H) and (|2.7)) it follows that if the condition (|1.1|) holds then 

^nr<^n= [ (l - (1 - x)") z^(dx) < / nxz^(dx)=n, 
JO Jo 

and similarly, using (|2.8|) . 

^r(n) < <^(n) = / (1 - e""^) z^(dx) < / nxi/(dx) = n. 

JO JO 

Hence, subtracting (|2.23|) from (|2.13|) and using the estimates p.l6|) and 
(ITTKl) . we arrive at (PT7|) . □ 

3. Bounded variance. In this section, we mainly focus on the situa- 
tion where the variance V{t) is bounded. 

3.1. Auxiliary estimates. We first derive various useful inequalities in- 
volving the functions V{t), $(t), ^r{t) and the measure v. Since ^'{t) is 
decreasing and V{t) = $(2i) — $(t), the mean value theorem yields 

, $(2t) - <^(t) V(t) , 
$'(2t) < ^ ' ^ = < $'(t), 

or equivalently 

(3.1) \^i{2t)<V{t)<^,{t). 
The first inequality in (|3.1|1 generalizes. 
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Lemma 3.1. For r = 1, 2, . . . and t > 0, 

nr(r+l)/2 

Mt)< —vi2-H). 

t\ 

Proof. Setting fr{t) := (-l)''+i$('^)(t) > (see LemmaiHii)), we shall 
prove by induction the equivalent inequality 

(3.2) m<- (t>0). 

Suppose (|3.2j) holds for /i, ... , fr-i- Note that = fr+iit) > 0, hence 

the function fr-i is convex and therefore 

(3.3) fr-im-fr-l{t) ^ ^ 

On the other hand, since fr-i{t) > and by the induction hypothesis, 

/,.i(t/2)-/,_i(t) fr-im 2^(^-^y^v{2-^t) 

^ ' t/2 - t/2 - {t/2Y 

Combining and ()3.4|) . we obtain 1)3. 2() for fr- Thus, the induction step 
follows, and the proof is complete. □ 

Consider the limits superior 

(3.5) -u := limsup y(t), (pr := limsnp ^rit) (r = l,2, ...). 

t—*oo t— >oo 

By continuity, V{t) is uniformly bounded on [0,oo[ if and only if v < oo, 
and the same is true for ^rit) in terms of the condition ip^ < oo. 

Note that v is strictly positive (cf. [3, page 1258]); indeed, setting t = 
l/pk in <|2.12() we have 

oo 

(3.6) V > limsup^(e-P^/P''- - e-^P^/P") > e'^ - e'^ > 0. 

Corollary 3.2. The conditions v < oo and (^i < oo are equivalent 
and imply (pr < co for all r >1. 

Proof. Follows from (|3.1|) and Lemma EH □ 

Appealing to Lemma 12.31 we have depoissonization in terms of moments. 



14 L. V. BOGACHEV, A. V. GNEDIN AND YU. V. YAKUBOVICH 

Corollary 3.3. If v < oo then, as n — > oo, 

^{n) - = O(n-i) , V{n) - K = Oin"^) , 
and, for all r > 1, 

$r(n)-$„,. = 0(n-i). 

3.2. Uniform upper bounds for (pr- Lemma l3 . 1 1 entails an estimate of (pr 
through either v or (pi. With some more effort, we will derive an improved 
upper bound that does not depend on r. Recall that the measure v is defined 
in H2.6|) . and consider the new (finite) measure 

oo 

(3.7) t'(dx) := xv{dx) = pj 5p. (dx) . 

i=i 

When the normalization Hl.lj) holds, this is a probability measure governing 
the frequency distribution of the random box discovered by ball 1. 
Using the measure 9, we can rewrite 1)2. 1U() as follows 

fT POO 

(3.8) $,(t) = -- / x'-ie-^*f>(dx). 

r\ Jo 

Also, let us set 

i>[0,x] 



(3.9) ?7 := limsup ■ 

Lemma 3.4. Suppose that v < oo. Then for all r = 1,2, . . . 

(3.10) <Pr 1^ V ^ 6(^1 < 2e'U. 

Proof. Note that the last inequality in 1)3. 1U|) follows from 1)3. Further, 
integrating by parts in (|3.8|) and using the substitution y = xt, we get 

(3.11) ^,{t) = -j^ e-yf-\y + l-r)u[{),y/t\dy. 

For r = 1, due to monotonicity of the function i'[0, •], (|3.1H) implies 

(3.12) $i(t) > tj'^ 9[0, y/t] dy > e'' , 

and by letting here t — > oo we obtain <pi > (see (jSini), (prrTT|l V 
On the other hand, for any r > 1 from 1)3. 11(1 it follows that 

r\ Jo y/t 

which implies (^r ^ ^ by the "limsup" part of Fatou's lemma 0, §IV.2]. □ 
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3.3. Growth of the mean number of occupied boxes. Lemma 13.41 implies 
that if -u < oo then each term in the decomposition <I>(t) = X^r^i ^rit) makes 
a uniformly bounded contribution to ^{t) oo. This is to be contrasted 
with the case of frequencies akin to pj ~ cj~" (a > 1), where V{t), 
and ^r{t) (r > 1) are of the same order 0(t°) as t — > oo (see ji^). The next 
lemma estimates the growth of ^{t) in the case of bounded variance. 

Lemma 3.5. Suppose that v < oo. Then 

lim sup < 2v . 

t-^oo log t 

Proof. For any e > 0, there exists to > such that for all t > to 

^i(i) <^i + e<2v + e, 
due to Lemma 13.41 Therefore, 

$(t)-$(to)= /*$'(s)ds= r^i^ds < (2^; + e)(log^-log^o)■ 
J^o Jto s 

Hence, 

iim sup = lim sup < 2v + e, 

t^oo log t t-*oo log t - log to 

and since e > is arbitrary, our claim follows. 

A shorter proof is by a simple "lim sup" version of L'Hopital's rule: 

lim sup- < lim sup—— — = limsup<I>i(t) = tpi <2v, 

t^oo logt t-*oo l/t t->oo 

due to Lemma IXH □ 



3.4. The basic representation of the variance V{t). As in [25(, it is con- 
venient to rewrite the formula (|2.13|) for the variance as a single integral 
representation. Recall that v is given by ()2.6() . and introduce the function 

(3.13) Au{x) := u]x/2, x] = #{j : x/2 < pj < x} (x > 0). 
Lemma 3.6. The variance V{t) can be represented as 

(3.14) V{t)=t e"*^'Az^(x) dx (t>0). 

Jo 
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Proof. Setting udx) := z^]x,oo[ and integrating by parts in (|2.13|) gives 



V{t)= / (e-2*--e-*^')di.e(x) 



POO 

= (e-2*- - e-*") i^cix)]'^ + t e-'^i^cix/2) - u,{x)) dx 

POD 

= tlimxh'c(x) + t / e^^^ Aiy(x) dx, 
^10 ^ ' Jo 

and (|3.14j) will follow if we show that xi'c{x) — > as x | 0. To this end, note 
that the mean value of the measure u is finite: /q°° x z^(dx) = J2j^iPj < 
Hence, integration by parts yields 

POO POO 

(3.15) oo > 3;z^(dx) = limxfc(x) + / i'c{x)dx, 

Jo ^10 Jo 

and it follows that the limit in (|3.15|) exists and, moreover, must vanish, for 
otherwise the integral on the right-hand side of ()3.15|) would diverge. □ 

Corollary 3.7. The function 

(3.16) D{x) := f Au{u)du 

Jo 

is well defined and uniformly bounded for all x >0. In particular, D{0) = 0. 
Proof. Letting t = 1 in 1)3. 14p . we obtain 

V{1)> e-"Az/(u) dn > e"^ / Aiy{u)du, 
Jo Jo 

hence D{x) < e^V{l) < oo for any x > 0. Vanishing at zero is obtained by 
the absolute continuity of the integral. Finally, boundedness of D{x) follows 
because Ai/(x) = for all x large enough. □ 

Integrating by parts in (|3.14|) and using Corollary 13. 7| we obtain an al- 
ternative representation, which will also be useful: 

(3.17) V{t) =t^ r e-'''D{x)dx= r e-yy^^^^dx (t>0). 

Jo Jo y/t 

3.5. Estimates using the function Az^(x). It is immediately clear from 
that if Az^(x) < c for all x > then V{t) < c for all t > 0. Moreover, 
one can obtain two-sided asymptotic bounds as follows. 
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Lemma 3.8. Recall that v is given by (|3.5|) . and set 

w := limsup Az^(x). 

Then v < oo if and only if w < oo, and in this case 
(3.18) {V5-2)w <v <iu. 

Proof. The substitution y = tx in (|3.14jl yields 

Vit)= / e~yA,,iy/t)dy, 
Jo 

and an application of the "hmsup" part of Fatou's lemma §IV.2] imphes 

POO 

V < w dy = w . 

Jo 

For the converse inequality, we need to exploit the special structure of the 
measure u. Fixing x > and retaining in H2.12|l the terms with pj G ]x/2, x] 
only, we obtain 



(3.19) V{t) > Au{x) min (e 

p€[x/2,x] 



-tp _ ^-2tp 



It is clear that the minimum in (|3.19|) is attained at one of the endpoints, 
that is, p = x/2 or p = x. Setting y = e~*^/^ S [0, 1], we note that 



2 2 4t \ y"^ -y^, <y < 



min {y -y ,y -y } 



y-y"^, (t><y<l, 



where (p = (\/5 — l)/2 is the golden ratio, which appears here as the root 
of the equation — = y — y"^ on ]0, 1[. It is then easy to see that the 
right-hand side of (|3.19)) . as a function of t, attains its maximum value 
- ^2 ^ ^ _ 2 at t{x) = 2x-^ log (1/0) ^ oo (x i 0). Hence V{t{x)) > 
(\/5 — 2) Ai/(x), and the first inequality in (|3.18|) follows. □ 

Our next goal is to characterize the link between the upper (lower) bounds 
on the values of the function Ai/(x) (for small x) and the lagged frequency 
ratios Pj+k/Pj (for large j) with regard to the threshold value 1/2. 

Lemma 3.9. For a given positive integer k, the bound 

(3.20) Az^(x) < k 
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is valid for all sufficiently small x > if and only if the condition 

(3.21) Ez±^ < i 

is satisfied for all sufficiently large j. The similar assertion holds true when 
the sign < in both ()3.20() and (|3.21() is replaced by > . 

Proof. The first part of the lemma (i.e., with < ) is just a reformulation 
of definitions (see (|3.13jl ). Indeed, applying (|3.2U|) with x = pj implies pj+fe < 
Pj/2, which is ()3.21|) . Conversely, if Pj < x < Pj-i then by (|3.21() we have 
Pj+k < Pi/2 < x/2, and hence Ai/(x) = i/]x/2,x] < k as required by (|3.2U|) . 

The "mirror" part (i.e., with >) needs a bit more care. First, note that 
it suffices to prove the "only if" statement in the case where pj > Pj+i, for 
if Pj = pr (r > j) then Pj+k/Pj > Pr+k/Pr- Now, if x G [pj+i,pj[ then the 
condition Ai/(x) > k implies that pj+k > x/2, whence by letting x | pj we 
get Pj-\-k > Pi/2. Similarly, the "if" part follows by noting that Pj+k ^ Pi/2 
implies Ai/(x) > k for each x G \pj+i,pj[. □ 

3.6. Refined asymptotic estimates. By Lemma 13.91 and the inequality 
1)3. 18() . the upper bound ()3.2U|) implies v < id < k. In some cases, however, 
such an estimate may not be sharp, as the next example demonstrates. 

Example 3.10. Let pj = j2~^ £ ^'^i/2j so by Theorem 11.11 we have 
limt^oo ^(t) = 1- On the other hand, ()3.2H) holds starting from k = 2, 
which leads to the crude bound v < 2. An inspection shows that Aj/(-) = 
1 on [2pj+i,pj_i[ and Az/(-) = 2 on [pj,2pj+i[ {i > 4). For a given x G 
[pj,Pj-i[, "excess" over the value 1 on the interval ]0, x] occurs on a set of 
total Lebesgue's measure bounded by J2i>j{'^Pi+i ~Pi) = ^i>j 2~* = 2"-^+-^, 
which is small as compared to x > pj (j — > oo). 

This example suggests the following refinement of Lemma 13.91 

Lemma 3.11. If for some A; G N the frequencies (pj) satisfy 

(3.22) iimsup^<^, 

i^oo Pj 2 

then limsupj^oQ ^(t) ^ ^- assertion remains valid when the symbols < 
and limsup are simultaneously replaced by > and liminf. 

Proof. It suffices to assume that A: = 1, as the general case would then 
follow by the additivity argument (see the remark after Example 12. 2|) . Ac- 
cording to H3.22() (with A; = 1), for any e G ]0, 1/5] and all sufficiently large i 
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we have Pi+i/pi < 1/2 + e. Hence, pi+i/pi^i < (1/2 + ef < 49/100 < 1/2, 
and Lemma 13.91 implies that /S.u{x) < 2 for all sufficiently small x. 

On the other hand, using the definition of the function Ai^(-) one can 
check that Ai/(x) < 1 when x £ [piA (2pj_|_i),pj_i[. That is to say, the value 
Ai'{x) = 2 may only occur on a subset of [pi,pi-i[ with Lebesgue's measure 
not exceeding (2pj_|_i — pj) V (here a A 6 := min{a, b}, ay b := max{a, b}). 
Therefore, for x G [pi,pi^i[ we have 

PX 

/ Ai/(n) du < x - pi + (2pj+i - Pi) V < X - pi + 2£pi . 
J Pi 

Inserting these estimates into p.lOp . we obtain for x € [pj,pj^i[ 

D{x)= I Ai/(n)dii + ^ / ' Az^(ti)di( 
"'Pi i>j ■'Pi 

<x + 2eY,pi<x + 2epjY.\^- + ej =x + - 



2e 



It follows that 

— ^ < 1 + 7 < 1 + >1 (e ^ , 

X - (l-2e)x- l-2e ^ ^' 

hence limsup^|o -^^^)/^ ^ 1- Finally, applying to H3.17() the "limsup" part 
of Fatou's lemma 16, §IV.2], we obtain 

/■°° D(y/t) f°° 

limsup / e~^y — dy < / e~^ydy = l, 

t^oo JO y/t Jo 

and the first half of the lemma is proved. 

For the second half (with > and liminf), suppose again that k = 1. 
According to 1)3. 22() . for any e £ ]0, 1/2] and all sufficiently large i, we have 
Pi/pi-i > 1/2 — e. Observe that possible deviations of the function Ai/(-) 
from value 1 may only occur as follows: if Pi-i < 2pi then Az/(x) > 2 for 
X G \pi-i,2pi[, while if pi-i > 2pi then Ai/(x) = for x G [2pj,pj__i[. In 
either case, the contribution of the interval with the endpoints pi-i and 2pi 
to the integral /(^(Ai/(n) — \)du = D{x) — x is bounded from below by 



(•pi_lV(2pi) 

/ {IS.v{u) - l)du>2pi- pi_i. 

Jpi_lA(2p,) 
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Using this remark, for a given x G ]pj,pj-i] we obtain 

/ (Az^(n) -l)du> {2pj - pj^i) A + V(2pi - p^-l] 
= (2Pi - Pj-i) A - + ^ Pi 



oo 



- 1 - 2e '^■^ 1 + 2e 
_ 4pje 4pje _ 8pj£ 



1 ^ 



Hence 



1 - 2e 1 + 2e 1-4^2 



^(^) > _ 8pje ^ ^ 8e 



(l-4e2)a;- l-4e2 



and since e is arbitrary, it follows that lim'mixio D{x)/x > 1. It remains to 
use Fatou's lemma in H3.17() to conclude that liminff^oo ^{t) ^1- D 

Corollary 3.12. Suppose that the condition ()1.2() is satisfied for some 
/c G N, that is, pj^k/Pj — > 1/2 as j ^ oo. Then V{t) k as t ^ oo. 

Proof. Readily follows by combining the two halves of Lemma [3. Ill □ 

Note that Corollary 13.121 is exactly the "if" part of Theorem 11.11 In Sec- 
tion [21 below, where the issue of converging variance is considered in detail, 
we will give a direct, shorter proof of the necessity of the condition (|1.2|) . 

Example 3.13. Note that a converse statement to either half of Lemma 
IXTTl is not valid. Indeed, if {pj) £ RJg with q £ [0,l/2[, then Az^(-) = 1 
on \pi,2pi[ and Az^(-) = on [2pi,pi-i[ (for i large enough). This implies 
that the graph y = D{x)/x consists of arcs of hyperbolas with alternating 
monotonicity (supported on intervals of the form [pi,2pi[ and [2pi,pj_i[), 
and in particular 

D{x) D{2pj) 1 ^ 1 + pj 
max = — = y Pi = -, 

xe[pj,pj-i] X 2pj 2pj f^. 2 

^^'^^^ . X D{p.,) 1 ^ 

mm D[x) = = y^pi = q{l+ pj), 
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where pj = Pj^J2i>jPi (cf- The RTg-condition impUes that pj 

q/ (1 — q) as j — > oo, so from H3.23() we get 

(3.24) < Urn inf < lim sup < 



1 — q xio X xlQ X 2(1 — g) 

In particular, setting g = (e.g., when {pj) is a Poisson distribution) and 
taking a "doubled" sequence (i.e., determined by z/(dx) = Z^j^i 25pj (dx)) , 
by the additivity argument we get \\m.sni>^^^V {t) < 2 • (1/2) = 1, while 
limsup j^^Pj+i/pj = 1. Likewise, choosing q = 1/3 and again doubling the 
sequence, from (|3.24j) and by Fatou's lemma applied to (|3.17|) . we obtain 
that hminf(_^oo ^(^) ^ 2- (1/2) = 1, whereas lim'mij^oo Pj+i/Pj = Q = 1/3. 

3.7. Proof of Theorem 11.21 We are now in a position to prove Theorem 
11.21 and let us start by proving its poissonized version. By Lemma 13.81 the 
conditions v < oo and id < oo are equivalent, and according to the first half 
of Lemma l3.9| the latter condition holds if and only if (|3.21|) is satisfied for 
some k G N, which is equivalent to (|1.6() (possibly, with a bigger k). 

The second part of the theorem (leading to the estimate v < k) is settled 
by Lemma [3.111 since condition (|3.221) of the lemma coincides with condition 
(|1.6|) of the theorem. 

Furthermore, by 1)2. 15() the condition limsup^^o^ < oo is equivalent to 
< oo, in which case also limsup^^o^^ Vn = v hy Corollarv 13.31 
Finally, the optimahty of the bound v < k follows by merging k geometric 
sequences with ratio g = 1/2 each and using the additivity argument (alter- 
natively, one can consider the geometric frequencies with ratio q = 2"^/^^). 
Thus, the proof of Theorem ll.21 is complete. 

3.8. Comment on the threshold constant. Let us remark that the thresh- 
old 1 /2 in Theorem 11.21 is chosen to match neatly with Theorem 11.11 Re- 
placing 1/2 in (|1.6j) by some other value < g < 1 would lead to a more 
sophisticated upper bound 

(3.25) limsupK < A:riogi/g2], 

n— >oo 

where [x] := min {m G Z : m > x} is the ceiling integer part of x. Indeed, it- 
erating the condition limsup j^^pj+k/Pj < Q, we get lim sup j^^pj+ik/pj < 
q^ < 1/2, provided that i > [log]^/q2], and 1)3. 25() follows by Lemma [3. Ill 

In fact, the constant [log^/^ 2] here has the meaning of an upper bound 
for lim sup„_^oo Vn in the geometric case with ratio q . Note that the rep- 
resentation (|1.5j) leads to a similar (in general, slightly better) estimate 
limsup^^oo K < (logi/g 2 + max(5y(-)) (^f- (|3-25|) '). 
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4. Convergence to infinity. In this section, we establish new sufficient 
conditions in order that V{t) — > oo as t — > oo (which, in view of H2.15() . is 
equivalent to I^i — > oo as n — > oo). Note that the combination of Theorems 
14.11 and 14.31 (to be proved in Sections 14.11 and 14.21 respectively) along with 
discussion in Section f4. 41 will settle Theorem 11.31 stated in the Introduction. 

4.1. First set of sufficient conditions. It is natural to seek a condition 
for V{t) — >■ oo based on the representation (|2.12|) . that is, in terms of the 
function Ai'{x). In turn, such a condition may be transformed into the 
information about the lagged ratio pj+k/Pj (cf. Theorem II. 2|) . 

Theorem 4.1. The condition 

(4.1) limAz^(x) = oo 
implies that 

(4.2) VfcGN, liminf^^>-, 

j-*oo pj 2 

which in turn implies that V{t) oo as t ^ oo. 

Proof. If condition 1)4. 1() holds then for any A; € N we have Al'{x) > k for 
all sufficiently small x > 0. By Lemma 13.91 this implies that pj^^/Pj > 1/2 
for all j large enough, and (|4.2|) follows. Further, condition (|4.2j) implies 
convergence of V{t) to infinity by Lemma l3. Ill □ 

Note that condition ()4.2|) is obviously fulfilled for any sequence (pj) from 
RTi, in which case it is well known that V{t) — > oo |l3l . Esl . The next 
example demonstrates that there are instances of frequencies (pj) satisfying 
(|4.1I) but not in RTi. This example will also show that conditions (|4.1[) and 
(|4.2() of Theorem 14. II are not necessary in order that V{t) — > oo. 

Example 4.2. Let < g < 1 and suppose that the sequence (pj) consists 
of the values g*, each repeated i times (z = 1, 2, . . . ), which corresponds to 
the measure z/(dx) = X^i^i ^'^g»(da;)- Note that the sequence (pj) is not in 
any RT-class, since limsupj^^pj+i/pj = 1 but liminfj^oo Pj+i/Pj = Q- 
However, for any q G ]0, 1[ we have V{t) — > oo, since for t G [q~-' , q^-'^^] 

oo 

V{t) = i {e-''' - 6-2"'*) > j (e--?^* - e-2«^*) 
1=1 

> j min (e-^ - e^^y) = j (e-V? _ ^^Vq) _^ ^ (i ^ oo) . 
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If 1/2 < (7 < 1 then for x G [q^ we have Ai^(x) > j — > cxd as x | 0, and 
condition ()4.1|) is vahd. On the other hand, if0<g<l/2 then Ai/(x) = 
for X S [Iq^ ,q^~^[, hence hminfa^jo ^^(2;) = and (|4.1|) fails. Also, for any 
> 1, we have liminfj^ooPj+fc/Pj = q < 1/2, so condition H4.2|) is not valid. 

4.2. Another set of conditions. A different sufficient condition exploits 
the link between V{t) and the mean number of singleton boxes as in 
Lemma \\\A\ An equivalent condition may be set in terms of the tail ratio 
Pj = pj^ J2i>jPi (see (|1.7() ). Recall the definition (|3.7(1 of the measure u. 

Theorem 4.3. The condition 

(4.3) lim^ = oo 

xlO X 

is equivalent to 

(4.4) lim Pj = 00, 

and each one implies that V{t) -^00 as t —> 00. 

Proof. By the estimate ()3.12() . condition 1)4. 3|) implies ^>i(t) 00, which 
is equivalent to V{t) — > 00 by 1)3.11) . So it remains to show that 1)4. 3() and 
(|4.4() are equivalent to each other. Observe that for Pj+i < x < pj we have 
x^^D[0,x] > Pj, hence ()4.4|) implies (|4.3() . To prove the converse, note that 
if Pj+i = Pj then pj = 1 + Pj+i, so it suffices to consider the case where 
Pj+i < Pj. Then 

■ f i>[0,x] 
Pj = mf > 00 (j — ^ cxDj , 

Pj+i<x<pj X 

when the condition ()4.3)) holds, and hence ()4.4)) follows. □ 

4.3. A counterexample to Theorem 14.31 We construct here an example 
demonstrating that conditions (|4.3|1 . (|4.4I1 are not necessary in order that 
V(t) — > 00 (or, equivalently, ^i{t) 00). In particular, due to the esti- 
mate ()3.10j) (with r = 2), this example will show that V{t) 00 does not 
necessarily imply $2(0 00. On the other hand, in view of the inequality 

2Mt) > E iiPjf^-''' > e ]l/{2t),l/t]} = ^-^Au{l/t), 

l/2<tpj<l 

it is a priori clear that ^2{t) cannot be uniformly bounded in such a situation, 
because in = co according to (|3.18|1 . 
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Example 4.4. Let ko, ki,k2, ■ ■ ■ be an increasing integer sequence. Take 
the frequencies (pj) in the form 



(4.5) pj 



■ A;/, 0<j<ko, 
kill, ko-\ h ki-i < j <ko-\ \-ki, 

which corresponds to the measure 



(4.6) i^idx) = J2 (d^) = Y.^i (da;) • 

i=i i=o 

That is to say, the array of boxes is partitioned in blocks so that i-th block 
contains ki boxes of frequencies l/fcj+i (i = 0, 1, 2, . . . ). 

The heuristics underlying this example is as follows. A prototype instance 
is a block of k equal boxes each with frequency, say, q. The mean number 
of singleton boxes within the block is a single-wave function ktqe~^'^ which 
increases to its maximum k/e at time t = 1/q and then goes down to 0. 
Now, the idea is to combine a series of such blocks in order to guarantee a 
suitable overlap of the waves produced by successive blocks. If the sequence 
(ki) grows fast enough, then for each i = 0,1,2,... there exists a time 
instant (of order of /cj+i) when boxes belonging to i-th block start to get 
occupied. After some time, the mean number of singletons among these 
boxes is still relatively large, say not less than log log /cj, but the expected 
number of balls that fall in boxes of further blocks becomes large too, and 
almost all these balls produce singleton boxes, since /cj+i is yet much larger 
(hence the frequencies are smaller). As time passes, all boxes belonging to 
blocks 0,1, ... ,i are likely to contain more than one ball each, while the 
balls hitting other blocks remain sole representatives of their boxes. 

To make this heuristic work, we choose 

(4.7) kr.= 2^\ i = 0,1,2,..., 

so that ki^i = kf for all i. We wish to check that ^i{t) goes to infinity but 
$2(t) does not. Using ^HT)^ and we have 

tki 



(4.8) $i(t)=t/ xe-*^z.(d2;) =^-5l^e-*/^>+^ =:^^i(t), 

•^0 z=Q i=0 

(4.9) <S>,{t) = f rx'e-'^u{dx) = ^E^e-*/^-^ =4i:^^W- 
2 Jo 2f^^ k,+i 2 ^ 



As a function of t, each summand Ai{t) in the sum 1)4. 8|) increases up to the 
maximum value Ai{tl) = kie~^ attained at t* = h^i, and then decreases to 
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zero. Two consecutive summands, Ai{t) and are equal at the point 

where their common value is 

Using the elementary inequality k^'^^^^^^^ > e^^ (A; > 1), we note that 

Ai{t[) > ki^'^^-'^hogh > e-Hogh. 
Since ti_-^ < t* < t'^ (z = 1, 2, . . . ), it follows that for all t G t-] , 

Mt)>Mt)>e"^logki^i, 

hence 

liminf > e~^liminf logA;j„.i = oo. 

t — ^oo i — ^oo 

Turning to $2(^)5 note that the summand Bi(t) in H4.9() attains its maxi- 
mum value at the point t = 2t* = and Bi{2t*) = 4e~^fcj, so 

$2(2t*) > Bi{2t*) = Ae-^ki -^00 (i^oo). 

On the other hand, on the sequence t" := Sfcj+i logkj one has 

Setting X = fej+i and a = kj^i logkj, we note that the function x~3/2 g-3a/x 
increases for < x < 2a. Hence, for alH = 0, 1, . . . , j, 

and therefore 

(4.10) ± A(t;o < (j + i)i?,(t;o = + . 

i=0 '^j 



For « > J + 1, we have 



kikij^i ki 
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and since ki = 2^' > 2^* for i > 4, it follows 

oo oo Q 

(4.11) m])<9-2'^ E 2"'^ = ^- 

i=j+l «=i+i 

Combining the estimates (j4.1()|) and (|4.11j) yields ^2{t'j) as j ^ oo . 
Thus ^2{t) does not have a limit as t — > cxd, and moreover 

liminf $2(i) = 0, limsup ^2it) = oo. 

Finally, it is easy to see directly that in this example the limit in (|4.4|) 
does not exist. Indeed, along the subsequence j = kQ + ki + - ■ ■ + ki, according 
to and 

P, = k,^i(^+'^ + ---)=l + 0{kr^\)^l (z-oo). 

On the other hand, for j = ko + ki + ■ ■ ■ + ki + 1 we have 

Pj = ki+2 —, V -, \ > ki+i - 1 ^ oo [i^ oo). 

\ ki+2 ki+3 J 

Karlin 25, page 384] gives an example of frequencies for which V{t) con- 
verges to along a sequence of values of t, and converges to oo along another 
sequence; in that case ^i{t) demonstrates the same type of behavior. Our 
Example 14.41 exhibits a more exotic "second order" pathology: this time, 
^>i(t) — > oo but ^2it) oscillates between and oo. 

4.4. Relationship between the various sufficient conditions. First of all, 
note that condition 1)4. 2() in Theorem 14. II does not imply condition (|4.1|) . A 
counterexample may be constructed by a slight modification of ExamDle l4.2l 
as follows: define the frequencies (pj) by setting i^(dx) = J2iZi'i'^pi, where 
Pi := j"^2~*, then limmij^ooPj+k/Pj = liminfj_oo^i+i/K = 1/2 (so that 
(|0) is satisfied), but for {i + l)-'^2-' < x < i-^2-^ we have /^v{x) = 0, 
hence liminfa;|o Az^(x) = and (|4.1j) fails. 

Further, it is easy to see that condition (|4.2|1 in Theorem 14. II implies the 
set of equivalent conditions (|4.3() . (|4.4() in Theorem 14.31 but not the other 
way around. Indeed, if (|4.2j) is satisfied then for pj defined in (|1.7j) we have 

M , 

liminfpj > liminf E > M ^ oo (M^oo), 

j—*oo Jf— >oo Pj 2 
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and condition (|4.4|) follows. On the other hand, we have seen that in Example 
14.21 condition 1)4.2(1 fails, while for < x < q^~^ we have 

X X qi ^ r-^. I — q 

and the condition (|4.3|) is valid. 

As Example 14.41 shows, a converse to Theorei n 14.31 is not valid, unless 
under further assumptions on the measure i> (cf. |l3l. \2^). For instance, if 
i>[0, x] varies regularly at zero, then Karamata's Tauberian theorem (see 0, 
§1.7.2] or 16, §XIII.5]) applied to (EH) yields i>[0,x]/x ~ c^>i(l/x) as x i 0, 
so that the convergence ^i{t) — > cxd as t — > oo does imply the condition ()4.3j) . 

Remark. By Karamata's Tauberian theorem, the convergence 

POO 

$i(t) = t / e"*"" P(dx) ^ c {t^ oo) 



is equivalent to z>[0, x]/x — > c as x | 0. Interestingly, the implication may 
fail for c = oo, as Example 14.41 demonstrates. 

5. Convergence to a finite limit. We will now investigate the situa- 
tion where the variance V{t) has a finite limit as i ^ oo, which is the central 
topic of this work (see Theorem 11.1(1 . As already mentioned in Section 13.61 
the "if" part of Theorem 11.11 follows from Corollarv l3.121 So the main goal of 
this section is to prove the "only if" part (i.e., the sufficiency of the condition 
(|1.2() ). but we will also give a streamlined proof of the necessity. 



5.1. Criterion of convergence. Recall that D{-) is a primitive function of 
Az^(-), defined by (|3J6l) . 



Lemma 5.1. In order that there exist a finite limit 

(5.1) limV{t)=:v, 

it is necessary and sufficient that 

(5.2) lirn^^ = v. 

xlO X 

Proof. Note that, according to ((3.6() . v > 0. By the representation 
(j3.14() . we can rewrite ()5.1() as 

(5.3) / e-*^dl?(x)~- (t^oo). 
Jo t 

By Karamata's Tauberian theorem (see [8, §1.7.2], § XIII. 5]), the relation 

(j5.3() is equivalent to D{x) ~ vx as x | 0, which is the same as ()5.2() . □ 
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5.2. Some implications of convergence. 

Lemma 5.2. Suppose that the limit 1)5. 2() exists, and let a, (3 > be 
arbitrary variables such that a, (3 I and (a + /3)/(/3 — a) = 0(1). Then 

D{l3)-D{a) 
lim = V . 

o,/3io (3 - a 

Proof. Using 1)5. 2|) . we have 

D{l3)-D{a) vP{l + o{l))-va{l + o{l)) o(l)(a + /3) 

R = a = ^"^ R ' ^' 

p — a p — a p — a 

since the ratio (a + /?)/(/? — a) is bounded. □ 

Lemma 5.3. If the finite limit 1)5. If) exists then the limiting value v must 
be a positive integer number, v = k £ N, and in this case 

(5.4) ^^X{ue]0,x]:Auiu)^k}^^^ 

xlO X 

where A{-} denotes Lebesgue's measure on M+ . 

Proof. By Lemma 13.81 the function Ai/(n) is uniformly bounded. By 
definition, it counts the number of frequencies pj in the interval ]u/2,u], 
therefore Ai>{u) is piecewise constant, with jumps at points u = pj and 
u = 2pj. Thus, for any given interval ]x/2,x] the total number of such 
jumps is uniformly bounded by a constant, say M < oo. 

Let ]a, /9[ be the maximal open subinterval of ]x/2,x], on which Az^(-) is 
constant. Clearly, its length satisfies P — ct > x/2{M + 1), thus 

(5-5) 0<P^<—^ -=4(M + 1). 

^ ^ - P - a - x/2{M + 1) ^ ' 

Consider a closed interval [qi,/3i] C ]«,/?[ with qi = (3a + /3)/4, (3i = 
{3f3 + a)/4. Since ai + Pi = a + (3 and Pi — ai = {P — a)/2, by the bound 
H5.5|l Lemma 15.21 applies to yield 

. ^ 1 /"^^ . . X , D(pi)-D(ai) , , , 

(5.6) / Av{u)d.u = ^^ ^^1- (xjO). 

Pi - ai Jai Pi - ai 

But the function Ai/(-) is constant on ]a, P[ D [ai, Pi] , hence its sole (integer) 
value must coincide with the asymptotic mean v given by 1)5. 6() . In particular, 
V must be integer, v = k G N. 



VARIANCE OF THE NUMBER OF OCCUPIED BOXES 



29 



Along the same lines, one can show that for any e > and all small 
enough x, the function Az^(-) takes the value v = k on the interval ]a;/2,x] 
everywhere except on a set of Lebesgue's measure smaller than ex. Thus, 
Lebesgue's measure of the set {u G ]0,x] : A.i>{u) 7^ k} is bounded by 
eX^i^i 2~*"'"^x = 2ex, and since e is arbitrary, (|5.4|) follows. □ 

5.3. Lagged frequency ratio and the proof of Theorem 11.11 

Lemma 5.4. // the limit (|5.1|) exists {hence v = k £ N by Lemma l5.3() . 
then {cf. (1121)) 

Pj+k _ 1 



(5.7) lim 



Pj 2 



Proof. Without loss of generality, it suffices to consider j G N such that 
2pj^k / Pj- Suppose first that 2pj^f^ < pj. Then for x S [2pj_|_fc,pj[ we have 
]x/2,x] C ]pj+k,Pj[ and hence Az/(x) < A; — 1. Therefore, 

(5.8) D(pj) - D{2pj+k) = Aiy{u)du<{k-l)(pj-2pj+k). 

Using that D{x) = kx{l + o(l)) as x | (see Lemma from (|5.8|) we 
deduce that liminij^oo Pj+k/Pj ^ 1/2, which, together with the hypothesis 
Pj+k/Pj < 1/2 (see above), implies (|5.7|) . 

Likewise, if < 2pj+k then for x G [pj,2pj_(_fc[ we have ]x/2,x\ D 
\pj+k,Pj], hence Ai/{x) > k + I and (cf. (|5.8|) ) 

D{2pj+k) - D{pj) = / Az^(u) dn > (/c + l)(2pj+fc - pj). 

Similarly as before, this simplifies to limsupj^^pj+k/Pj < 1/2, and since 
we assumed that Pj+k/Pj < 1/2, (|5.7|) follows. The proof is complete. □ 

Let us now show the converse of Lemma 15.41 fas mentioned at the begin- 
ning of Section 13 this also follows from Corollarv 13.12)1 . 

Lemma 5.5. Assume that the sequence (pj) satisfies the condition 1)5. 7|1 
for some G N. Then the limit 1)5. 1() exists and v = k. 

Proof. By additivity, it suffices to prove that for each subsequence ^j^*'' := 
Pi+k(j~i) (^ = 1) • • • ) ^)! its contribution to the limit ()5.1|) equals exactly 1. 
Thus the proof is reduced to showing that if [pj) G then 

00 

(5.9) V{t) = Y, (e-*P^ - e-2*?'^ ) ^ 1 {t^oo). 
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By the RT-condition, 2pj^i = pj{l + 7^), where 7^ ^ as j — > 00. 
Hence, for any e £ ] 0,1/3] and all j large enough we have \^j\ < e. In 
particular, pj^2/Pj < (1 + £)^/4 < 4/9 < 1/2, which implies by Lemma 
that Av^x) < 2 for small x. By Lemma [3. 81 and the estimate 1)3. ip . it follows 
that $i(-) is bounded. Returning to (|5.9|1 . observe that 

M M 

(5.10) - = H (1 - e"*^^^0 - + e-2*P«+i . 
i=io j=jo 

By the inequality |1 — e~^| < \y\ e'^' , the sum in (|5.1())1 is dominated by 

M 00 

J2 e-'P^^^-'hpje < e^e-*f^(i-")tpj = $i(t(l - e)) = 0(e). 
i=io j=i 

Passing to the limit in l|5.10jl as M 00, we obtain V{t) = 1 + o(l) + 0(e) 
as t — > 00, and since e is arbitrarily small, we arrive at 1)5. 9|) . □ 

We are now able to complete the proof of our main Theorem 11.11 charac- 
terizing the case of converging variance. Indeed, putting together Lemmas 
I5.4l and l5.5l vields the desired criterion for V{t) v. Appealing to Corollary 
13.31 we conclude that the same condition applies to Vn — > v. 

5.4. Link with Karlin's condition. In conclusion, let us recall that Kar- 
lin's sufficient condition for V{t) — > v [2^ . Theorem 2] involves (i) the con- 
dition limsupj^ooPj+i/pj < 1 and (ii) an integral condition, which in our 
notation reads 

(5.11) lim - / Au(l/y)dy = v, 

^^00 X Jo 

or, after an obvious change of variables, 

(5.12) lim x / Au{u) u~'^ du = V . 

^10 Jx 

Throughout his paper, Karlin also postulates that the function I'dx) = 
u]x,oo[ is regularly varying at zero (see 0! pages 376-377]. As we shall see, 
this condition is superfluous and may be omitted (in fact, Karlin's proof 
of his Theorem 2 only requires the boundedness of Ai'(x), which follows 
easily from condition (i)). Note that condition (i) itself is not necessary for 
the convergence of V{t): for instance, it does not hold for a sequence (pj) 
obtained by merging several geometric sequences with ratio 1/2 into one. 
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Furthermore, application of condition (|5.11|) to the geometric case (with 
ratio q) yields the following (cf. [2^ . Example 6] containing an error). Let 
log]^/g2 = k + 6, where k = [logj^/^ 2] is the integer part of logj^/^ 2 and 
6 £ [0, 1[ is its fractional part. From the definition of Az/(-) it follows that 



-j^ Au{l/y)dy = -J^ ([logi/,(2y)]-[logi/,y])d2/ 

(5.13) =k + ^J^ ([5 + logi/gy] - [logi/qy])dy. 

If 5 = 0, the integral in (|5.13|) vanishes and condition ()5.11|) yields v = k. 
However, if < 5 < 1 then ()5.13|) does not have a limit as a; ^ 00, since for 
X = the integral term amounts to 



3 

1 

i=l 

whereas for x = q^^^^+^ it reads 
j 



q^Y.q-^il-q^)^l^ 



00 



00 



i=l 



As a result, condition (|5.1ip is satisfied if and only if logi^^ 2 = A; G N, or 
equivalently q = 2~^^^. Our Theorem 11.11 gives the same result, so ()5.11() 
proves to yield a correct answer in the whole range of the geometric case. 

This observation brings up the question about the exact relationship be- 
tween Karlin's condition ()5.11|) (or ()5.12() ) and our criterion 1)5. 2|) . Surpris- 
ingly enough, we can demonstrate the following. 

Theorem 5.6. Condition ()5.12|) is equivalent to (|5.2|) . and hence the 
former is necessary and sufficient in order that V{t) ^ v as t ^ 00. 

Proof. Suppose condition (|5.2I) holds. Using the notation D{x) (see 
p.lbf) ) and integrating by parts, we get 

POO /-oo D(x) f°° 
X Au{u) — = x u'^dD{u) = — + 2x D{u)u'^du 

Jx U Jx X Jx 

D(x) Dixs) o , 1 , . , ^ 

= ^ + 2 / —-^ ^ + 2v / ds = v (x i 0) , 

X Jl xs Jl 
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where we used that the function D{u)/u is bounded on ]0, oo[ (in particular, 
the dominated convergence theorem can be apphed). Hence, 1)5. 12(1 follows. 



On the other hand, condition (|5.12)) amounts to 
(5.14) \\Ta.xG{x) = V, G{x) := / /\v{u)u~''^ du. 

Again integrating by parts, we obtain 

- Au(u)du = — / dG(u) = -xG(x) + - uG(u)du 

X Jo X Jo X Jo 

= —xG{x) + 2 1 xs G{xs) ds —v + 2v = v (x | 0) , 

where we may use dominated convergence because the function uG{u) is 
bounded on ]0, 1] due to (|5.14|) . Thus, condition (|5.12|) implies (|5.2j) . and 
the proof is complete. □ 

Remark. The statement of Theorem l5.6l is a particular case of a general 
Karamata theorem (see 0, §1.6.3], [l^ . § VIII. 9]), according to which the 
limiting relation 1)5. 2|) is equivalent to either of the limits 

/■oo 

limx^-M Av(u)u-''du = (f^ > 1), 

a;J,0 Jx a -I 

liui x"-^ r Aiy{u) u-" du = {a<l). 
xio Jo 1 — a 

(Note that (|5.2() itself is contained in the second formula with (7 = 0.) That 
is to say, our condition (|5.2|) may be included in a parametric family of mu- 
tually equivalent criteria, set in terms of rescaled integrals of the function 
Az/(-) against polynomial weights (the canonical criterion ()5.2() being appar- 
ently the simplest). We have given a direct proof of Theorem 15. 61 because of 
the historic interest of Karlin's condition (|5.11() . 
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